Computer vision algorithms include methods for acquiring, processing, analyzing and understanding digital images, and extraction of data from the real world. It is an interdisciplinary field that deals with how can computers gain a high-level understanding of digital images. It aims to mimic human vision.
Convolutional neural networks are now capable of outperforming humans on some computer vision tasks,
such as classifying images.
In this project, I provide a solution to the Landmark Recognition Problem. Given an input photo of a place anywhere around the world, the computer can recognize and label the landmark in which this image was taken.
The dataset I used for this project is Kaggle Google Landmark Recognition 2019 dataset and can be
downloaded from Common Visual Data Foundation Google Landmarks Dataset v2.
import numpy as np
import pandas as pd
from IPython.display import display # Allows the use of display() for DataFrames
import matplotlib.pyplot as plt
# Pretty display for notebooks
%matplotlib inline
import sys, os
from os import path
import csv
import pickle #object binary serialization
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' #suppress tensorflow warnings
from keras.models import Sequential
from keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.utils.np_utils import to_categorical
from keras.utils import plot_model
from keras import applications, optimizers
from keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img
from google.colab import drive
drive.mount('/gdrive')
cd /
%%shell
cp -v /gdrive/My\ Drive/shared/Udacity_MLND_capstone_dataset/data.tar.gz /
tar xvzf data.tar.gz
mkdir models
mkdir -p docs/figures
mkdir -p docs/stats
# Data paths definitions :
data_dir = "../data"
input_csv_dir = path.join(data_dir,"input_csv") #csv files that were dowlnoaded from kagggle
train_dir = path.join(data_dir, "train") ##Training images directory
validation_dir = path.join(data_dir, "validation") ##Validation images directory
test_dir = path.join(data_dir, "test") ##Validation images directory
test_images_dir = path.join(data_dir, "test_images") #test images directory (unlabeled images used for prediction)
models_dir = "../models"
docs_dir = "../docs"
stats_dir = path.join(docs_dir, "stats") #Output csv statistics directory
figures_dir = path.join(docs_dir, "figures") #Output directory for saving figures
Did you ever go through your vacation photos and ask yourself: What is the name of this temple I visited in China? Who created this monument I saw in France? Landmark recognition can help! This technology can predict landmark labels directly from image pixels, to help people better understand and organize their photo collections.
This problem was inspired by Google Landmark Recognition 2019 Challenge on Kaggle.
Landmark recognition is a little different from other classification problems. It contains a much larger number of classes (there are a total of 15K classes in this challenge), and the number of training examples per class may not be very large. Landmark recognition is challenging in its way.
This problem is a multi-class classification problem. In this problem, I built a classifier that can be trained using the given dataset and can be used to predict the landmark class from a given input image.
I have chosen to use convolutional neural networks and transfer learning techniques as classifiers. CNNs yield better results than traditional computer vision algorithms. I trained a basic convolutional neural network, then used pre-trained VGG16 and Xception models in transfer learning to solve the Google Landmark Recognition 2019 Problem.
I will use two evaluation metrics for my project. Accuracy score, which is implemented in Keras, and Global Average Precision Metric (GAP).
The primary evaluation metric for this problem is the accuracy score. Since our dataset is balanced (not skewed), the accuracy score can be used successfully. Accuracy is the fraction of predictions our model got right. $$\text{Accuracy} = \frac{\text{Number of correct predictions}}{\text{Total number of predictions}}$$
$$\texttt{accuracy}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples}-1} 1(\hat{y}_i = y_i)$$This metric from Google Landmark Recognition 2019 - Evaluation Metric} is also known as micro Average Precision (microAP), as per F. Perronnin, Y. Liu, and J.-M. Renders, "A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval," Proc. CVPR'09 works as follows:
where:
The implementation of this metric was obtained from David Thaler GAP metric implementation on Kaggle.
# Script Source : [Kaggle - David Thaler - Gap Metric](https://www.kaggle.com/davidthaler/gap-metric)
def GAP_vector(pred, conf, true, return_x=False):
'''
Compute Global Average Precision (aka micro AP), the metric for the
Google Landmark Recognition competition.
This function takes predictions, labels and confidence scores as vectors.
In both predictions and ground-truth, use None/np.nan for "no label".
Args:
pred: vector of integer-coded predictions
conf: vector of probability or confidence scores for pred
true: vector of integer-coded labels for ground truth
return_x: also return the data frame used in the calculation
Returns:
GAP score
'''
x = pd.DataFrame({'pred': pred, 'conf': conf, 'true': true})
x.sort_values('conf', ascending=False, inplace=True, na_position='last')
x['correct'] = (x.true == x.pred).astype(int)
x['prec_k'] = x.correct.cumsum() / (np.arange(len(x)) + 1)
x['term'] = x.prec_k * x.correct
gap = x.term.sum() / x.true.count()
if return_x:
return gap, x
else:
return gap
Here We define a class that records performance metrics to csv file and initialize an instance of it:
class StatsCSV:
def __init__(self, csv_file):
self.csv_file = csv_file
with open(self.csv_file, 'w', newline='') as csvfile:
header_writer = csv.writer(csvfile)
header_writer.writerow(['Model', 'Test Loss', 'Test Accuracy', 'Test GAP'])
def add_stats(self,model_name, loss, accuracy, GAP):
with open(self.csv_file, 'a', newline='') as csvfile:
stats_writer = csv.writer(csvfile)
stats_writer.writerow([model_name, round(loss,4), round(accuracy * 100, 2), round(GAP * 100, 2)])
stats_csv = StatsCSV(path.join(stats_dir, "stats.csv"))
class IndexCSV:
def __init__(self, name, index_csv_dir, index_csv_file):
try:
self.index_data = pd.read_csv(path.join(index_csv_dir, index_csv_file), usecols=['id', 'category'])
print("File has {} samples with {} features each.".format(*self.index_data.shape))
except:
print("File could not be loaded. Is the dataset missing?")
self.name = name
self.fig = None
self.ax = None
self.freqs = None
def get_freqs(self):
self.freqs = self.index_data['category'].value_counts().to_frame()
self.freqs.columns = ['images_count']
return self.freqs
def get_plot(self):
#self.fig, self.ax = plt.subplots()
self.ax = self.index_data['category'].value_counts().plot.barh(title=self.name)
self.ax.set(xlabel='Images Count', ylabel='Landmark (Class) Name')
self.ax.grid(True)
return self.ax
Load index csv files :
index_train_csv = IndexCSV("Training Data", data_dir, "index_train.csv")
index_validation_csv = IndexCSV("Validation Data", data_dir, "index_validation.csv")
index_test_csv = IndexCSV("Test Data", data_dir, "index_test.csv")
landmarks_list = index_validation_csv.index_data.groupby('category').nunique().index.to_list()
landmarks_df = pd.DataFrame(landmarks_list , index=np.arange(1, len(landmarks_list) + 1), columns = ['Landmarks'])
display(landmarks_df)
landmarks_df.to_csv(os.path.join(stats_dir, "selected_landmarks.csv"), index=True) #Save to CSV
train_freqs = index_train_csv.get_freqs()
display(train_freqs)
train_freqs.to_csv(os.path.join(stats_dir, "train_freqs.csv"), index=True)
validation_freqs = index_validation_csv.get_freqs()
display(validation_freqs)
validation_freqs.to_csv(os.path.join(stats_dir, "validation_freqs.csv"), index=True)
test_freqs = index_test_csv.get_freqs()
display(test_freqs)
test_freqs.to_csv(os.path.join(stats_dir, "test_freqs.csv"), index=True)
The horizontal bar plots in show the frequency distribution of classes in training, validation and testing sets. The $y$-axis shows the landmark categories and the $x$-axis represents the frequencies of landmark categories. From the plot we can see that the datasets are well balanced and the image samples are almost uniformly distributed between classes:
plt.figure(figsize=(8, 14))
plt.subplot(311)
index_train_ax = index_train_csv.get_plot()
plt.subplot(312, sharex=index_train_ax)
index_validation_ax = index_validation_csv.get_plot()
plt.subplot(313, sharex=index_train_ax)
index_test_ax = index_test_csv.get_plot()
index_train_ax.xaxis.set_tick_params(which='both', labelleft=True) # Get ticklabels back on shared axis
index_validation_ax.xaxis.set_tick_params(which='both', labelleft=True) # Get ticklabels back on shared axis
plt.savefig(path.join(figures_dir, "dataset_hbar_plot.pdf"), bbox_inches = 'tight')
plt.show()
I have chosen the transfer learning technique to solve the given problem. In transfer learning, we build our model to take advantage of some pre-trained model architectures that take hours for supercomputers to train. The fact that a neural network can learn different features at different layers and that a pre-trained model trained on a task can be used for a similar task is the basis of this technique.
In this problem, I will use VGG16 and Xception pre-trained models from Keras Applications that were trained on the ImageNet dataset without their top layers (fully connected layers) as feature extractors. I will configure my models to apply global average pooling to the output of their last convolutional blocks, so the output model will be a 2D tensor. Then, I will pass my training dataset through each network and save their output bottleneck features. Finally, I will create a top-model for each network that is composed of fully connected layers that take such bottleneck features and output a vector corresponding to our landmark classes (here 10 classes). Transfer learning speeds up training and improves performance significantly.
I will compare my reused pre-trained models to a benchmark traditional convolutional neural network that I'll implement and train from scratch.
Our benchmark convolutional neural network is a simple stack of 3 convolution layers with ReLU activation followed by max-pooling, global average pooling, and dense layers . For the final output dense_ layer, I used the SoftMax activation function to get a vector of probabilities in the $[0,1]$ range. This is very similar to the architectures that Yann LeCun advocated in the 1990s for image classification Handwritten Zip Code Recognition with Multilayer Networks (except for ReLU). I have also added a dropout layer to prevent overfitting.
class BenchmarkDataProcessor:
def __init__(self, input_shape, batch_size, train_dir, vaidation_dir, test_dir):
self.train_datagen = ImageDataGenerator(
rescale = 1.0/255, #rescale pixel values from [0,255] to [0,1]
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
self.test_datagen = ImageDataGenerator(rescale=1.0/255)
self.input_shape = input_shape
self.batch_size = batch_size
self.init_train_generator(train_dir)
self.init_validation_generator(validation_dir)
self.init_test_generator(test_dir)
def __init_generator(self, datagen, images_dir):
return datagen.flow_from_directory(
directory=images_dir , # this is the target directory
target_size=self.input_shape[:2], # all images will be resized to input shape 224x224
color_mode="rgb",
batch_size=self.batch_size,
class_mode='categorical',
shuffle=False)
def init_train_generator(self, train_dir):
self.train_generator = self.__init_generator(self.train_datagen, train_dir)
def init_validation_generator(self, validation_dir):
self.validation_generator = self.__init_generator(self.test_datagen, validation_dir)
def init_test_generator(self, test_dir):
self.test_generator = self.__init_generator(self.test_datagen, test_dir)
input_shape=(224, 224, 3)
batch_size = 16
benchmark_data_processor = BenchmarkDataProcessor(
input_shape,
batch_size,
train_dir,
validation_dir,
test_dir)
img = load_img(path.join(train_dir, "Kazan/3e222ad7d1469deb.jpg"))
img_array = img_to_array(img)
plt.imshow(img_array/255)
plt.title("Original Image")
plt.savefig(path.join(figures_dir, "augmented_image_original.pdf"), bbox_inches = 'tight')
plt.show()
#----------------------------
columns = 4
rows = 5
fig = plt.figure(figsize=(20,20))
for i, batch in enumerate(benchmark_data_processor.train_datagen.flow(img_array.reshape((1,) + img_array.shape), batch_size=16)):
if i > rows * columns - 1:
break
fig.add_subplot(rows, columns, i+1)
plt.imshow(batch[0])
plt.savefig(path.join(figures_dir, "augmented_image_transformations.pdf"), bbox_inches = 'tight')
plt.show()
# My benchmark model - a simple CNN
benchmark_model = Sequential()
benchmark_model.add(Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=input_shape))
benchmark_model.add(MaxPooling2D(pool_size=(2, 2)))
benchmark_model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
benchmark_model.add(MaxPooling2D(pool_size=(2, 2)))
benchmark_model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
benchmark_model.add(MaxPooling2D(pool_size=(2, 2)))
benchmark_model.add(GlobalAveragePooling2D())
#benchmark_model.add(Flatten())
benchmark_model.add(Dense(64, activation='relu'))
benchmark_model.add(Dropout(0.5))
benchmark_model.add(Dense(10, activation='softmax'))
benchmark_model.summary()
benchmark_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Save Model Architecture to file
plot_model(benchmark_model,
to_file=path.join(figures_dir, "benchmark_model_architecture.pdf"),
show_shapes=True,
show_layer_names=False
)
benchmark_weights_file = path.join(models_dir, "benchmark-model_weights.hdf5")
benchmark_history_file = path.join(models_dir, "benchmark-model_history.pickle")
#Checkpointer to save model best weights
benchmark_checkpointer = ModelCheckpoint(filepath = benchmark_weights_file,
monitor='val_acc',
verbose=1,
save_best_only=True)
#Early Stopping
early_stopping = EarlyStopping(monitor='val_acc',
verbose=1,
patience=10)
epochs = 50
steps_per_epoch = benchmark_data_processor.train_generator.samples // benchmark_data_processor.batch_size
validation_steps = benchmark_data_processor.validation_generator.samples // benchmark_data_processor.batch_size
#epochs = 5
#steps_per_epoch = 15
#validation_steps = 4
benchmark_model_history = benchmark_model.fit_generator(
benchmark_data_processor.train_generator,
steps_per_epoch=steps_per_epoch,
epochs=epochs,
callbacks = [benchmark_checkpointer, early_stopping],
validation_data=benchmark_data_processor.validation_generator,
validation_steps=validation_steps,
verbose=1)
# Save Model History
with open(benchmark_history_file, 'wb') as pickle_file:
pickle.dump(benchmark_model_history, pickle_file)
# Load Model History
with open(benchmark_history_file, 'rb') as pickle_file:
benchmark_model_history = pickle.load(pickle_file)
# Loading Best Weights
benchmark_model.load_weights(benchmark_weights_file)
def plot_learning_curves(model_history, model_name, plot_filename):
fig, ax = plt.subplots(2, 1)
fig.set_size_inches(8, 12)
#fig.suptitle(model_name + ' Performance Metrics')
ax[0].plot(model_history.history['acc'])
ax[0].plot(model_history.history['val_acc'])
ax[0].set_title(model_name + ' Performance Metrics\n' + 'Accuracy')
ax[0].legend(['Training', 'Validation'], loc='upper left')
ax[0].set_xlabel('Epoch')
ax[0].set_ylabel('Accuracy')
ax[0].grid(True)
ax[1].plot(model_history.history['loss'])
ax[1].plot(model_history.history['val_loss'])
ax[1].set_title('Loss')
ax[1].legend(['Training', 'Validation'], loc='upper right')
ax[1].set_xlabel('Epoch')
ax[1].set_ylabel('Loss')
ax[1].grid(True)
plt.savefig(plot_filename, bbox_inches = 'tight')
plt.show()
plot_learning_curves(benchmark_model_history, 'Benchmark Model', path.join(figures_dir, "benchmark_model_metrics.pdf"))
Accuracy :
[benchmark_test_loss, benchmark_test_accuracy] = benchmark_model.evaluate_generator(benchmark_data_processor.test_generator, benchmark_data_processor.test_generator.samples, workers=2, use_multiprocessing=True, verbose=1)
print(benchmark_model.metrics_names)
print([benchmark_test_loss, benchmark_test_accuracy])
GAP :
def get_GAP(model, test_generator):
test_generator.reset()
samples_count = test_generator.samples
images = np.empty((0,) + input_shape)
max_batch_index = len(test_generator)
i = 0
for batch in test_generator:
images = np.append(images, batch[0], axis=0) #image data
# print(batch[1][i]) #image labels
i += 1
if i > max_batch_index - 1:
break
probabilities = model.predict(images)
predicted_classes = model.predict_classes(images)
confidence_scores = probabilities[(np.arange(samples_count), predicted_classes)]
true_labels = test_generator.labels
return GAP_vector(predicted_classes, confidence_scores, true_labels)
benchmark_test_GAP = get_GAP(benchmark_model, benchmark_data_processor.test_generator)
print(benchmark_test_GAP)
Saving results :
stats_csv.add_stats("Benchmark Model", benchmark_test_loss, benchmark_test_accuracy, benchmark_test_GAP)
I defined a DataProcessor class that wraps Keras ImageDataGenerator() and has attributes like train_generator, validation_generator, and test_generator that act as directory iterators. They return batches of augmented images. This class also handles image augmentation and pixel value re-scaling to the range $[0,1]$
The following code is the implementation of DataProcessor class :
class DataProcessor:
def __init__(self, input_shape, batch_size, train_dir, validation_dir, test_dir):
self.train_datagen = ImageDataGenerator(rescale = 1.0/255) #rescale pixel values from [0,255] to [0,1]
self.test_datagen = ImageDataGenerator(rescale=1.0/255)
self.input_shape = input_shape
self.batch_size = batch_size
self.init_train_generator(train_dir)
self.init_validation_generator(validation_dir)
self.init_test_generator(test_dir)
def __init_generator(self, datagen, images_dir):
return datagen.flow_from_directory(
directory=images_dir , # this is the target directory
target_size=self.input_shape[:2], # all images will be resized to input shape 224x224
batch_size=self.batch_size,
class_mode=None,
shuffle=False)
def init_train_generator(self, train_dir):
self.train_generator = self.__init_generator(self.train_datagen, train_dir)
def init_validation_generator(self, validation_dir):
self.validation_generator = self.__init_generator(self.test_datagen, validation_dir)
def init_test_generator(self, test_dir):
self.test_generator = self.__init_generator(self.test_datagen, test_dir)
input_shape=(224, 224, 3)
batch_size = 1
data_processor = DataProcessor(
input_shape,
batch_size,
train_dir,
validation_dir,
test_dir)
I also defined a PretrainedModel class that wraps a pre-trained transfer learning model and our defined top-model. It has methods for :
The following code is the implementation of PretrainedModel class :
class PretrainedModel:
def __init__(self, model_name, pretrained_model, data_processor, save_path ):
self.model_name = model_name
self.pretrained_model = pretrained_model
self.data_processor = data_processor
self.save_path = save_path
def get_file_path(self, file_name):
return path.join(self.save_path, self.model_name + '_' + file_name)
def predict_bottleneck_features(self):
max_queue_size = 10 #defult is 10
workers = 2 # Default is 1
self.bottleneck_features_train = self.pretrained_model.predict_generator(
self.data_processor.train_generator,
steps=self.data_processor.train_generator.samples // self.data_processor.batch_size,
max_queue_size=max_queue_size,
workers=workers,
use_multiprocessing=True, #default is False
verbose=1)
self.bottleneck_features_validation = self.pretrained_model.predict_generator(
self.data_processor.validation_generator,
steps=self.data_processor.validation_generator.samples // self.data_processor.batch_size,
max_queue_size=max_queue_size,
workers=workers,
use_multiprocessing=True,
verbose=1)
self.bottleneck_features_test = self.pretrained_model.predict_generator(
self.data_processor.test_generator,
steps=self.data_processor.test_generator.samples // self.data_processor.batch_size,
max_queue_size=max_queue_size,
workers=workers,
use_multiprocessing=True,
verbose=1)
self.save_bottleneck_features()
def save_bottleneck_features(self):
np.save(open(self.get_file_path('bottleneck_features_train.npz'), 'wb'),
self.bottleneck_features_train)
np.save(open(self.get_file_path('bottleneck_features_validation.npz'), 'wb'),
self.bottleneck_features_validation)
np.save(open(self.get_file_path('bottleneck_features_test.npz'), 'wb'),
self.bottleneck_features_test)
def load_bottleneck_features(self):
self.bottleneck_features_train = np.load(open(self.get_file_path('bottleneck_features_train.npz'), 'rb'))
self.bottleneck_features_validation = np.load(open(self.get_file_path('bottleneck_features_validation.npz'), 'rb'))
self.bottleneck_features_test = np.load(open(self.get_file_path('bottleneck_features_test.npz'), 'rb'))
def create_top_model(self, optimizer):
'''
defines a fully connected top model that has three Dense layers with input size equals to bottleneck
features. it takes bottleneck features as input and outputs a vector of 10 classes corresponding to
our 10 landmark categories. I also used a Dropout layer to reduce overfitting .
I used ReLU activation for the first and second Dense layers and Softmax activation for the last
layer to create a vector of probabilities with values between 0 and 1.
'''
self.top_model = Sequential()
self.top_model.add(Dense(256, activation='relu', input_shape=self.bottleneck_features_train.shape[1:])) #[1:]
self.top_model.add(Dense(128, activation='relu'))
self.top_model.add(Dropout(0.3))
self.top_model.add(Dense(10, activation='softmax'))
self.top_model.compile(optimizer=optimizer,
loss='categorical_crossentropy',
metrics=['accuracy'])
self.top_model.summary()
def save_top_model_graph(self, figures_path): # Save Top Model Architecture to file
plot_model(self.top_model,
to_file=path.join(figures_path, self.model_name +"_top-model_architecture.pdf"),
show_shapes=True,
show_layer_names=False
)
def train_top_model(self, epochs, batch_size):
early_stopping = EarlyStopping(monitor='val_acc', verbose=1, patience=100)
checkpointer = ModelCheckpoint(
filepath=self.get_file_path('top-model_weights.hdf5'),
monitor='val_acc',
verbose=0,
save_best_only=True)
train_labels = to_categorical(self.data_processor.train_generator.classes)
validation_labels = to_categorical(self.data_processor.validation_generator.classes)
self.history = self.top_model.fit(self.bottleneck_features_train,
train_labels,
epochs=epochs,
batch_size=batch_size,
validation_data=(self.bottleneck_features_validation, validation_labels),
callbacks=[checkpointer, early_stopping],
verbose=1)
def save_top_model_history(self):
with open(self.get_file_path('top-model_history.pickle'), 'wb') as pickle_file:
pickle.dump(self.history, pickle_file)
def load_top_model_history(self):
with open(self.get_file_path('top-model_history.pickle'), 'rb') as pickle_file:
self.history = pickle.load(pickle_file)
def load_top_model_weights(self):
self.top_model.load_weights(self.get_file_path('top-model_weights.hdf5'))
def test_top_model(self):
self.load_top_model_weights()
test_labels = to_categorical(self.data_processor.test_generator.classes)
stats = self.top_model.evaluate(self.bottleneck_features_test, test_labels, workers=2, use_multiprocessing=True, verbose=1)
print(stats)
return stats
def get_GAP(self):
samples_count = self.data_processor.test_generator.samples
probabilities = self.top_model.predict(self.bottleneck_features_test)
predicted_classes = self.top_model.predict_classes(self.bottleneck_features_test)
confidence_scores = probabilities[(np.arange(samples_count), predicted_classes)]
true_labels = self.data_processor.test_generator.labels
GAP = GAP_vector(predicted_classes, confidence_scores, true_labels)
print(GAP)
return GAP
def plot_learning_curves(self, save_path):
fig, ax = plt.subplots(1, 2)
fig.set_size_inches(16,4)
fig.suptitle(self.model_name + ' Performance Metrics')
ax[0].plot(self.history.history['acc'])
ax[0].plot(self.history.history['val_acc'])
ax[0].set_title('Accuracy')
ax[0].legend(['Training', 'Validation'], loc='upper left')
ax[0].set_xlabel('Epoch')
ax[0].set_ylabel('Accuracy')
ax[0].grid(True)
ax[1].plot(self.history.history['loss'])
ax[1].plot(self.history.history['val_loss'])
ax[1].set_title('Loss')
ax[1].legend(['Training', 'Validation'], loc='upper right')
ax[1].set_xlabel('Epoch')
ax[1].set_ylabel('Loss')
ax[1].grid(True)
plt.savefig(path.join(save_path, self.model_name +"_metrics.pdf"), bbox_inches = 'tight')
plt.show()
Create an instance of PretrainedModel class, passing VGG16 models instance without its top layers (include_top=False) and with average pooling mode for feature extraction (pooling='avg') and an object of DataProcessor class :
VGG16_model = PretrainedModel("VGG16_model",
applications.VGG16(include_top=False, weights='imagenet',pooling='avg'),
data_processor,
models_dir)
VGG16_model.predict_bottleneck_features()
VGG16_model.load_bottleneck_features()
VGG16_model.create_top_model(optimizers.SGD(lr=0.01, clipnorm=1.,momentum=0.7))
# Save top-model architecture to file
VGG16_model.save_top_model_graph(figures_dir)
Fit (train) the model using the bottleneck features for 1000 epochs(we can choose a large number of epochs as training the dense layers is very fast, unlike our benchmark model that has convolutional layers) :
VGG16_model.train_top_model(epochs=1000, batch_size=4096)
VGG16_model.save_top_model_history()
VGG16_model.load_top_model_history()
VGG16_model.load_top_model_weights()
[test_loss, test_accuracy] = VGG16_model.test_top_model()
test_GAP = VGG16_model.get_GAP()
stats_csv.add_stats("VGG16 Model", test_loss, test_accuracy, test_GAP)
VGG16_model.plot_learning_curves(figures_dir)
Xception_model = PretrainedModel("Xception_model",
applications.Xception(include_top=False, weights='imagenet',pooling='avg'),
data_processor,
models_dir)
Xception_model.predict_bottleneck_features()
Xception_model.load_bottleneck_features()
Xception_model.create_top_model(optimizers.SGD(lr=0.01, clipnorm=1.,momentum=0.7))
# Save top-model architecture to file
Xception_model.save_top_model_graph(figures_dir)
Xception_model.train_top_model(epochs=1000, batch_size=4096)
Xception_model.save_top_model_history()
Xception_model.load_top_model_history()
Xception_model.load_top_model_weights()
[test_loss, test_accuracy] = Xception_model.test_top_model()
test_GAP = Xception_model.get_GAP()
stats_csv.add_stats("Xception Model", test_loss, test_accuracy, test_GAP)
Xception_model.plot_learning_curves(figures_dir)
For refinement, I recreated VGG16 and Xception pre-trained models, but I changed the optimizer type for the top-model from stochastic gradient descent SGD to RMSProp first proposed by Geoffrey Hinton in his lectures Neural Networks for Machine Learning - Lecture 6a - Overview of mini-batch gradient descent.
VGG16_model_refinement = PretrainedModel("VGG16_model_refinement",
applications.VGG16(include_top=False, weights='imagenet',pooling='avg'),
data_processor,
models_dir)
VGG16_model_refinement.predict_bottleneck_features()
VGG16_model_refinement.load_bottleneck_features()
VGG16_model_refinement.create_top_model(optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0))
VGG16_model_refinement.train_top_model(epochs=1000, batch_size=4096)
VGG16_model_refinement.save_top_model_history()
VGG16_model_refinement.load_top_model_history()
VGG16_model_refinement.load_top_model_weights()
[test_loss, test_accuracy] = VGG16_model_refinement.test_top_model()
test_GAP = VGG16_model_refinement.get_GAP()
stats_csv.add_stats("VGG16 Model(Refined)", test_loss, test_accuracy, test_GAP)
VGG16_model_refinement.plot_learning_curves(figures_dir)
Xception_model_refinement = PretrainedModel("Xception_model_refinement",
applications.Xception(include_top=False, weights='imagenet',pooling='avg'),
data_processor,
models_dir)
Xception_model_refinement.predict_bottleneck_features()
Xception_model_refinement.load_bottleneck_features()
Xception_model_refinement.create_top_model(optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0))
Xception_model_refinement.train_top_model(epochs=1000, batch_size=4096)
Xception_model_refinement.save_top_model_history()
Xception_model_refinement.load_top_model_history()
Xception_model_refinement.load_top_model_weights()
[test_loss, test_accuracy] = Xception_model_refinement.test_top_model()
test_GAP = Xception_model_refinement.get_GAP()
stats_csv.add_stats("Xception Model(Refined)", test_loss, test_accuracy, test_GAP)
Xception_model_refinement.plot_learning_curves(figures_dir)
During model development, I used a validation set to evaluate the model while training. The validation set accuracy and loss scores were the basis for selecting the best top-model weights.
The final solution model chosen is Xception Model with the RMSprop optimizer. It was selected because it had the best performance on test accuracy and GAP scores. This model's performance is satisfactory given the nature and difficulty of the landmark recognition problem.
The following list describes our final solution model :
include_top=False) and with average pooling mode for feature extraction (pooling='avg')lr=0.001To validate the robustness of the solution model, We will train the model with a different input image size of $150\times 150$.
Trying different image input shape 150x150
evaluation_input_shape = (150, 150, 3)
evaluation_data_processor = DataProcessor(
evaluation_input_shape,
batch_size,
train_dir,
validation_dir,
test_dir)
evaluation_model = PretrainedModel("evaluation_model",
applications.Xception(include_top=False, weights='imagenet',pooling='avg'),
evaluation_data_processor,
models_dir)
evaluation_model.predict_bottleneck_features()
evaluation_model.load_bottleneck_features()
evaluation_model.create_top_model(optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0))
evaluation_model.train_top_model(epochs=1000, batch_size=4096)
evaluation_model.save_top_model_history()
#evaluation_model.load_top_model_history()
evaluation_model.load_top_model_weights()
[test_loss, test_accuracy] = evaluation_model.test_top_model()
test_GAP = evaluation_model.get_GAP()
#stats_csv.add_stats("Solution Model Validation", test_loss, test_accuracy, test_GAP)
evaluation_model.plot_learning_curves(figures_dir)
The test loss score was 0.8985, the accuracy score was 77.93\%, and the GAP score was 74.18\%. The performance is slightly less but not very far from using an input size of $240\times 240$.
The final solution model has 84.91% test accuracy score and 82.68% GAP score. Its performance is significantly higher than the benchmark model which had 62.29% test accuracy score and 55.85% GAP score.
I will test my final solution model on three unseen landmark images randomly downloaded from the web for Jami Masjid in Feroz Shah Kolta, Cathedral of the Apostles Peter and Paul in Kazan and Golden Gate Bridge
from tqdm import tqdm
from sklearn.datasets import load_files
def extract_VGG16(VGG16model, tensor):
from keras.applications.vgg16 import VGG16, preprocess_input
return VGG16model.predict(preprocess_input(tensor))
def extract_Xception(Xceptionmodel, tensor):
from keras.applications.xception import Xception, preprocess_input
return Xceptionmodel.predict(preprocess_input(tensor))
Here we assign our best solution model to the variable solution_model and preprocessor function to extract_bottleneck_features variable
solution_model = Xception_model_refinement
extract_bottleneck_features = extract_Xception
def path_to_tensor(img_path):
# loads RGB image as PIL.Image.Image type
img = load_img(img_path, target_size=(224, 224))
# convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
x = img_to_array(img)
# convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
return np.expand_dims(x, axis=0)
def paths_to_tensor(img_paths):
list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
return np.vstack(list_of_tensors)
# Loading test image paths
test_data = load_files(test_images_dir)
test_images_paths = np.array(test_data['filenames'])
# Getting prediciton labels using solution model :
bottleneck_features = extract_bottleneck_features(solution_model.pretrained_model, paths_to_tensor(test_images_paths))
predicted_classes = solution_model.top_model.predict_classes(bottleneck_features)
predicted_labels = np.array(landmarks_list)[predicted_classes]
The following run shows that the solution model does a good job classifying the three images correctly.
# Plot test images and their predicted labels :
columns = len(predicted_labels)
rows = 1
fig = plt.figure(figsize=(20,20))
for i, image_path in enumerate(test_images_paths):
fig.add_subplot(rows, columns, i+1)
plt.imshow(load_img(image_path))
plt.title(predicted_labels[i])
plt.savefig(path.join(figures_dir, "visualization_predicted_images.pdf"), bbox_inches = 'tight')
plt.show()
Visually similar images from the training dataset that the model was trained on are shown below :
def get_train_image_path(image_id, label):
return path.join(data_dir, 'train', label, image_id + '.jpg')
similar_train_landmarks_ids =['Kazan','Feroz_Shah_Kotla', 'Golden_Gate_Bridge']
similar_train_images_ids = np.array([
['3e222ad7d1469deb', '33e8a9c8e96f20eb', '1586d22396244714'],
['1e33638e9fb39b0d', '2f7b8d029e1402b4', '1e4569e97ea7ade0'],
['5cd678ac220edb5a', '8c188787c993afba', '8e471206a5892d58'],
])
rows = similar_train_images_ids.shape[0]
columns = similar_train_images_ids.shape[1]
fig = plt.figure(figsize=(20,20))
for i, landmark in enumerate(similar_train_images_ids):
for j, image in enumerate(landmark):
image_path = get_train_image_path(str(image), str(similar_train_landmarks_ids[i]))
fig.add_subplot(rows, columns, columns*i + j + 1)
plt.imshow(load_img(image_path))
plt.title(similar_train_landmarks_ids[i])
plt.savefig(path.join(figures_dir, "visualization_training_samples.pdf"), bbox_inches = 'tight')
plt.show()
The entire end-to-end problem solution can be summarized as the following :
An interesting aspect of this project was that transfer-learning models achieved great performance in short training time. Their performance was better than the benchmark model that took a much longer time to train.
The challenging aspect of the project was the selection of classes and the extraction of a balanced subset dataset for the project from the large publicly available dataset. Another challenge was the implementation of the GAP performance metric and obtaining the correct input vectors.
Convolutional neural networks in general and transfer-learning techniques, in particular, are the best for image classification problems (up to current date) as we have seen in this project implementation. They are highly recommended for such problems and similar problems.
For the improvement of our solution model, my suggestions are the following :
cd /
%%shell
tar cvzf stats.tar.gz docs/
tar cvzf models.tar.gz models/
cp -v stats.tar.gz /gdrive/My\ Drive/shared/Udacity_MLND_capstone_dataset/
cp -v models.tar.gz /gdrive/My\ Drive/shared/Udacity_MLND_capstone_dataset/